Skip to content

feat(backup): incremental backup system for ~/.obk data#114

Merged
priyanshujain merged 49 commits intomasterfrom
backups
Mar 22, 2026
Merged

feat(backup): incremental backup system for ~/.obk data#114
priyanshujain merged 49 commits intomasterfrom
backups

Conversation

@priyanshujain
Copy link
Copy Markdown
Collaborator

Summary

Implements an incremental backup system (Issue #78) that uploads changed files from ~/.obk to Cloudflare R2 or Google Drive.

  • Content-addressable storage: SHA-256 hashing + zstd compression, deduplicates unchanged files across snapshots
  • Safe SQLite snapshots: Uses VACUUM INTO to create consistent point-in-time copies without WAL corruption
  • Two backends: Cloudflare R2 (S3-compatible via minio-go) and Google Drive (reuses existing OAuth)
  • Incremental: Only uploads files whose hash changed since last backup
  • Settings TUI: Backup category with validation — can't enable without destination + credentials configured
  • Setup wizard: obk setup flow for configuring R2 or Google Drive with credential validation
  • CLI commands: obk backup now|list|status|restore <snapshot-id>
  • Daemon job: Periodic backup via River worker (configurable: 6h/12h/24h/manual)

Remote storage layout

snapshots/<timestamp>.json    # Manifest with file hashes + sizes
objects/<2-char-prefix>/<sha256-hex>  # Compressed file blobs

Files added/modified

  • service/backup/ — Core service: backend interface, manifest, scanner, vacuum, R2, GDrive, restore
  • config/config.go — BackupConfig, R2Config, GDriveConfig structs
  • config/paths.go — BackupDir(), BackupStagingDir(), BackupLastManifestPath()
  • settings/registry.go — Backup category with field-level validation and ReadOnly guards
  • internal/cli/setup.go — Setup wizard for R2 and Google Drive
  • internal/cli/backup/ — CLI subcommands
  • daemon/jobs/backup.go — River periodic worker
  • daemon/river.go — Worker registration + periodic job scheduling

Test plan

  • 17 backup service tests (manifest diff, scanner patterns, LocalBackend CRUD, vacuum, full Run flow, restore, incremental changes)
  • 12 settings validation tests (enable/disable flows, ReadOnly guards per destination, credential checks)
  • All existing tests pass (go test ./...)
  • Manual E2E: obk setup → configure R2 → obk backup nowobk backup listobk backup restore
  • Manual E2E: obk setup → configure Google Drive → same flow
  • Verify daemon periodic job fires on schedule

…ckup paths

Adds config types for the backup system: destination (r2/gdrive),
schedule, and credential refs. Also adds BackupDir, BackupStagingDir,
and BackupLastManifestPath helper functions.
Defines the storage backend interface (Put/Get/Head/List/Delete) and
a LocalBackend for testing against the local filesystem.
Manifest tracks file hashes, sizes, and compressed sizes per snapshot.
DiffManifest compares against previous to find changed/removed files.
Walks ~/.obk and matches files against configurable include patterns
(data.db, config files, learnings, skills) while excluding WAL files,
logs, scratch dirs, and the backup dir itself.
Uses VACUUM INTO to create a consistent point-in-time copy of each
.db file into a staging directory, avoiding WAL corruption issues.
Implements the main backup pipeline: scan files, VACUUM INTO for .db
files, SHA-256 hash, diff against last manifest, zstd compress and
upload changed files, then save new manifest.
Implements the Backend interface for Cloudflare R2 (S3-compatible).
Includes ValidateR2() for connection testing during setup.
Implements Backend interface for Google Drive using the drive.file
scope. Includes FindOrCreateDriveFolder() for setup.
Restore downloads objects from a snapshot, decompresses them, and
writes to ~/.obk. ListSnapshots and GetManifest support the CLI.
Adds "Backup" to the source multi-select. The setup flow prompts for
R2 or Google Drive destination, validates connection, stores credentials
in keychain, and lets user pick a backup schedule.
Adds Backup section with enabled, destination, schedule fields plus
R2 (bucket, endpoint, access key, secret key) and Google Drive
(folder ID) sub-categories. Credentials use keyring via TypePassword.
Adds `obk backup now|list|status|restore` subcommands. The resolve
helper creates the appropriate backend from config + keyring.
Registers BackupWorker with River and adds a periodic job based on
the configured backup schedule (6h/12h/24h). Skips if backup is
not enabled or not linked.
Tests manifest diff, load/save, file scanner include/exclude rules,
local backend CRUD operations, full backup flow with hashing and
compression, object key generation, and compress/decompress round-trip.
Updates expected node count from 6 to 7 and adds "Backup" to the
expected labels list.
VacuumInto now takes relPath to preserve directory structure in staging
(gmail/data.db and whatsapp/data.db no longer collide). Service now
stores manifestPath and stagingDir as fields, with NewWithPaths() for
tests to inject temp dirs.
Adds 17 tests covering: Service.Run() full flow, incremental backup,
Restore with file verification, ListSnapshots empty/populated,
GetManifest, VacuumInto with real SQLite, VacuumInto no-collision
for same-named DBs in different dirs, Run with SQLite VACUUM INTO +
restore round-trip, and expanded scanner/backend edge cases.
Coverage: 26.8% → 51.7% (remaining 0% is R2/GDrive requiring infra).
backup.enabled Set() now refuses to enable unless the active
destination has all required fields (R2: bucket, endpoint, both keys;
GDrive: folder ID). R2/GDrive fields are ReadOnly when not the active
destination. Schedule is ReadOnly until a destination is configured.
Adds 12 tests covering all validation paths.
…ination authenticated

- backup.enabled is ReadOnly (greyed out) until destination is fully
  configured with credentials; always editable when already enabled so
  user can disable
- Changing backup.destination resets enabled to false (forces re-validation)
- backup.schedule is ReadOnly until backup is enabled (not just configured)
- Added tests: ReadOnly guards, destination change resets enabled,
  same-value destination keeps enabled
…verify → schedule

When user clicks "Enabled" or "Destination" in backup settings, a wizard
starts automatically (like the LLM profile wizard):

1. Select destination (R2 or Google Drive)
2. Enter credentials (R2: bucket/endpoint/keys, GDrive: folder ID)
3. Verify connection (R2 tested via S3 API)
4. Select schedule
5. Auto-enable and save

No more individual field editing — the wizard chains all steps.
…atically

Instead of asking for a raw folder ID, the GDrive backup wizard now:
1. Asks for a folder name (default "obk-backup")
2. Runs Google OAuth flow (opens browser for authentication)
3. Creates or finds the folder in Google Drive
4. Sets folder ID automatically

Added SetupGDrive callback to settings.Service, wired up in settings_cmd.go
with the full OAuth + FindOrCreateDriveFolder flow.
…play

The folder ID is an internal value that means nothing to users. It's
set automatically by the wizard/setup flow. No reason to show or edit it.
R2 credential fields now explain where to find each value on the
Cloudflare Dashboard. GDrive setup goes straight to OAuth with
default "obk-backup" folder name (no unnecessary prompt).
R2 sub-category only appears in the settings tree when destination
is set to "r2". Tree rebuilds after any backup field edit so the
category appears/disappears dynamically.
R2 fields are now hidden (not just read-only) when destination != r2.
Tests verify fields are absent when gdrive, present when r2.
Settings service now supports a triggerBackup callback for running
a backup after config changes, and exposes IsBackupDestConfigured
to check if a destination has valid credentials.
- Remove schedule step from wizard (default to 6h)
- All config changes are transactional: revert on Esc or failure
- Destination change: if already authenticated, swap immediately;
  otherwise run auth flow, revert if incomplete
- First-time wizard skips destination picker if dest already set
- Trigger backup when stale after: wizard complete, dest swap,
  or re-enable toggle
Resolves R2 or GDrive backend and runs backup synchronously when
the TUI determines a backup is stale after config changes.
Default to 6h schedule. Users can change it later in obk settings.
Verify R2 and GDrive destination configuration checks for partial
and complete config states.
Tests cover: deep clone, schedule parsing, rollback, wizard state
transitions, destination swap config, save defaults.
VACUUM INTO doesn't support parameterized queries, so a filename
containing a single quote would break or inject into the SQL statement.
Shows file count and target directory, asks for y/N confirmation.
Use --force to skip for scripting.
A folder or file name containing a single quote (e.g. "it's-a-backup")
would break the Drive API query. Escape with backslash per Drive API spec.
…tion

Backend resolution logic was duplicated in daemon/jobs, cli/backup, and
cli/settings_cmd. Now all three use backupsvc.ResolveBackend with
injected credential resolver and Google client factory.
Two backups within the same second (e.g. daemon + manual) would produce
the same ID and overwrite each other. Appending 4 random bytes ensures
uniqueness.
…o memory

compressFile now uses io.Copy from the file to the zstd writer,
avoiding loading the entire file contents into memory before
compression.
…kage

backupDest was defined identically in both settings/registry.go and
internal/settings/tui/backup_wizard.go. Export it from settings and
have tui call settings.BackupDest.
filepath.Walk follows symlinks, which could cause infinite recursion
if a symlink points to a parent directory. Switch to filepath.WalkDir
and skip symlink entries.
…backend

Head and Delete used strings.Contains(err, "not found") which is
fragile. Define errNotFound sentinel and use errors.Is for matching.
Verifies that Restore returns an error when the backend objects
referenced in the snapshot manifest have been deleted.
Shows human-readable local time alongside the snapshot ID instead
of just dumping the raw ID.
Hostname isn't useful in a personal tool. Replace with time-since-last
backup and show timestamp in local timezone instead of UTC.
Adds modes, new LLM providers, services, daemon, and storage layers.
@priyanshujain priyanshujain merged commit 9b04712 into master Mar 22, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant